An Unsupervised Approach for Content-Based Clustering of Emails Into Spam and Ham Through Multiangular Feature Formulation

نویسندگان

چکیده

The rapid growth of spam email attacks and the inherent malicious dynamism within those on a range social, personal business activities warrants an intelligent automated anti-spam framework. Attempts like malware propagation, identity theft, sensitive data pilfering, monetary as well reputational damage are sharply increasing, endangering privacy victim. Current solutions that rather incomplete when multidimensional feature email, is taken into account. We believe methodology based Artificial Intelligence, especially unsupervised machine learning way forward. This research attempts to investigating application for clustering Spam Ham emails. overall goal develop framework solely depends methodologies through approach includes multiple algorithms, primarily using content (body) subject header. has been done novel binary dataset 22,000 entries ham emails, composed ten features (reduced from eleven after reduction). Seven out these unique this study, engineered represent impactful analytical characteristics multiangular point view. Out five different algorithms investigated in work, OPTICS produced optimum demonstrating 0.26% higher average efficacy than its nearest performer DBSCAN. balanced accuracy DBSCAN was found be ≈75.76%.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spam Image Clustering for Identifying Common Sources of Unsolicited Emails

In this article, we propose a spam image clustering approach that uses data mining techniques to study the image attachments of spam emails with the goal to help the investigation of spam clusters or phishing groups. Spam images are first modeled based on their visual features. In particular, the foreground text layout, foreground picture illustrations and background textures are analyzed. Afte...

متن کامل

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

Fast and Effective Clustering of Spam Emails Based on Structural Similarity

Spam emails yearly impose extremely heavy costs in terms of time, storage space and money to both private users and companies. Finding and persecuting spammers and eventual spam emails stakeholders should allow to directly tackle the root of the problem. To facilitate such a difficult analysis, which should be performed on large amounts of unclassified raw emails, in this paper we propose a fra...

متن کامل

Feature Selection-model-based Content Analysis for Combating Web Spam

With the increasing growth of Internet and World Wide Web, information retrieval (IR) has attracted much attention in recent years. Quick, accurate and quality information mining is the core concern of successful search companies. Likewise, spammers try to manipulate IR system to fulfil their stealthy needs. Spamdexing, (also known as web spamming) is one of the spamming techniques of adversari...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3116128